Carson City
Chain-of-Thought Reasoning In The Wild Is Not Always Faithful
Arcuschin, Iván, Janiak, Jett, Krzyzanowski, Robert, Rajamanoharan, Senthooran, Nanda, Neel, Conmy, Arthur
Chain-of-Thought (CoT) reasoning has significantly advanced state-of-the-art AI capabilities. However, recent studies have shown that CoT reasoning is not always faithful, i.e. CoT reasoning does not always reflect how models arrive at conclusions. So far, most of these studies have focused on unfaithfulness in unnatural contexts where an explicit bias has been introduced. In contrast, we show that unfaithful CoT can occur on realistic prompts with no artificial bias. Our results reveal non-negligible rates of several forms of unfaithful reasoning in frontier models: Sonnet 3.7 (16.3%), DeepSeek R1 (5.3%) and ChatGPT-4o (7.0%) all answer a notable proportion of question pairs unfaithfully. Specifically, we find that models rationalize their implicit biases in answers to binary questions ("implicit post-hoc rationalization"). For example, when separately presented with the questions "Is X bigger than Y?" and "Is Y bigger than X?", models sometimes produce superficially coherent arguments to justify answering Yes to both questions or No to both questions, despite such responses being logically contradictory. We also investigate restoration errors (Dziri et al., 2023), where models make and then silently correct errors in their reasoning, and unfaithful shortcuts, where models use clearly illogical reasoning to simplify solving problems in Putnam questions (a hard benchmark). Our findings raise challenges for AI safety work that relies on monitoring CoT to detect undesired behavior.
I-athlon: Towards A Multidimensional Turing Test
Adams, Sam S. (IBM T. J. Watson Research Center) | Banavar, Guruduth (IBM T. J. Watson Research Center) | Campbell, Murray (IBM T. J. Watson Research Center)
While the Turing test is a well-known method for evaluating machine intelligence, it has a number of drawbacks that make it problematic as a rigorous and practical test for assessing progress in general-purpose AI. For example, the Turing test is deception based, subjectively evaluated, and narrowly focused on language use. We suggest that a test would benefit from including the following requirements: focus on rational behavior, test several dimensions of intelligence, automate as much as possible, score as objectively as possible, and allow incremental progress to be measured. In this article we propose a methodology for designing a test that consists of a series of events, analogous to the Olympic Decathlon, which complies with these requirements. The approach, which we call the I-athlon, is intended to ultimately enable the community to evaluate progress towards machine intelligence in a practical and repeatable way.
How to Write Science Questions that Are Easy for People and Hard for Computers
Davis, Ernest (New York University)
As a challenge problem for AI systems, I propose the use of hand-constructed multiple-choice tests, with problems that are easy for people but hard for computers. Specifically, I discuss techniques for constructing such problems at the level of a fourth-grade child and at the level of a high-school student. For the fourth grade level questions, I argue that questions that require the understanding of time, impossible or pointless scenarios, of causality, of the human body, or of sets of objects, and questions that require combining facts or require simple inductive arguments of indeterminate length can be chosen to be easy for people, and are likely to be hard for AI programs, in the current state of the art. For the high-school level, I argue that questions that relate the formal science to the realia of laboratory experiments or of real-world observations are likely to be easy for people and hard for AI programs. I argue that these are more useful benchmarks than existing standardized tests such as the SATs or Regents tests. Since the questions in standardized tests are designed to be hard for people, they often leave many aspects of what is hard for computers but easy for people untested
Seven Challenges in Parallel SAT Solving
Hamadi, Youssef (Microsoft Research, 7 JJ Thomson Avenue, Cambridge CB3 0FB, United Kingdom) | Wintersteiger, Christoph (Microsoft Research, 7 JJ Thomson Avenue, Cambridge CB3 0FB, United Kingdom)
This paper provides a broad overview of the situation in Parallel SAT Solving. A set of challenges to researchers is presented which, we believe, must be met to ensure the practical applicability of Parallel SAT Solvers in the future. All these challenges are described informally, but put into perspective with related research results, and a (subjective) grading of difficulty for each of them is provided.
Artificial Intelligence: Some Legal Approaches and Implications
Various groups of ascertainable individuals have been granted the status of "persons" under American law, while that status has been denied to other groups. This article examines various analogies that might be drawn by courts in deciding whether to extend "person" status to intelligent machines, and the limitations that might be placed upon such recognition. As an alternative analysis, this article questions the legal status of various human/machine interfaces, and notes the difficulty in establishing an absolute point beyond which legal recognition will not extend.